Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mon: increase cache sizes #24247

Merged
merged 4 commits into from Oct 14, 2018
Merged

mon: increase cache sizes #24247

merged 4 commits into from Oct 14, 2018

Conversation

liewegas
Copy link
Member

These two changes should mitigate the luminous->mimic upgrade disaster recently experienced by a user with a ~1000 node cluster.

I think these new defaults are reasonable, but comments welcome!

Longer term, I think we need a strategy for dynamically sizing these caches based on the size of the cluster?

@liewegas
Copy link
Member Author

retest this please

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an unreasonable amount of memory to expect a monitor to have, but right now they are often quite thin daemons (once you give them an SSD, anyway...) and the OSD map cache number is low because there were issues with monitors OOMing. So we're adding another ~500MB(+?) of memory needs to the monitor on reasonably-sized clusters, which isn't trivial.

Similarly we've been saying 1GB/TB on OSDs for a long time, but that messaging was quite confused for users of FileStore OSDs (I've certainly said on the list that I had no idea where it came from, since I think it was initially just made up without justification by some doc writer before we later decided it was a good idea?). Though I'm less worried about that change, assuming we don't backport it.

So generally I'm fine with changing these values, but we need at least some napkin math demonstrating they aren't real changes or else we need to message them pretty loudly.

@liewegas
Copy link
Member Author

liewegas commented Oct 5, 2018

I pushed commits that updates the hardware recommendations about RAM in the docs. I also included a release note.

The hardware recommendations definitely need a refresh--probably much more than I did here. (I would prefer not to block this critical fix to our defaults with a log conversation about the hardware recs, though!)

Metadata servers (ceph-mds)
---------------------------

The manager daemon memory utilization depends on how much memory its cache is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo - s/manager/metadata/

@gregsfortytwo
Copy link
Member

Yeah sounds reasonable, I like the doc changes. Reviewed-by: Greg Farnum gfarnum@redhat.com

@gregsfortytwo
Copy link
Member

@liewegas note there’s a merge conflict now though. :(

10 maps is too small to enable all mon sessions to keep abreast of the
latest maps, especially if OSDs are down for any period of time during an
upgrade.

Note that this is quite a bit larger, but the memory usage of the mon will
scale proportionally to the size of the cluster: 500 small osdmaps is not
a significant amount of RAM, while conversely having a large cache is
most important on a large cluster and those mons will generally have
plenty of RAM available.

Someday we should control this with a memory envelope like we do with the
OSDs, but that remains future work.

Signed-off-by: Sage Weil <sage@redhat.com>
For filestore OSDs, this is probably a good idea anyway, and is generally
not going to be hugely impactful on the memory footprint (where users
have been told to provide 1 GB RAM per 1 TB storage for a long time now).

For bluestore OSDs, this value is meaningless as we're autotuning this
anyway.

For mons, this is a more reasonable default.

Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas liewegas merged commit 2e66381 into ceph:master Oct 14, 2018
liewegas added a commit that referenced this pull request Oct 14, 2018
* refs/pull/24247/head:
	PendingReleaseNotes: add note about increased mon memory footprint
	doc/start/hardware-recommendations: refresh recommendations for RAM
	rocksdb: increase default cache size to 512 MB
	mon: mon_osd_cache_size = 500 (from 10)

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants